Advances in the Exon-Intron Database (EID)
نویسندگان
چکیده
Investigation of exon-intron gene structures is a non-trivial task due to enormous expansions of the eukaryotic genomes, great variety of gene forms, and the imperfectness in sequence data. A number of available informational systems on various gene characteristics complement each other and are indispensable for many genomic studies. Among them, the Exon-Intron Database (EID) is a good choice for large-scale computational examination of exon/intron structure and splicing. It has many internal filters that control for sequence quality, consistency of gene descriptions, accordance to standards, and possible errors. New innovations in EID are described. The collection of exons and introns has been extended beyond coding regions and current versions of EID contain data on untranslated regions of gene sequences as well. Intron-less genes are included as a special part of EID. For species with entirely sequenced genomes, species-specific databases have been generated. A novel Mammalian Orthologous Intron Database (MOID) has been introduced which includes the full set of introns that come from orthologous genes that have the same positions relative to the reading frames. Examples of statistical analyses of gene sequences using EID are provided. We present the latest data on our comparison of intron positions in 11,025 orthologous genes of human, mouse and rat, and find no convincing cases of intron gain. We discuss relevant data-quality issues of genomic databases. In particular, 5% of genes in genomic databases contain internal stop codons. This fact is due to a combination of biological reasons and also to errors in sequence annotations. The EID is freely available at www.meduohio.edu/bioinfo/eid/.
منابع مشابه
EID: the Exon?Intron Database?an exhaustive database of protein-coding intron-containing genes
To aid studies of molecular evolution and to assist in gene prediction research, we have constructed an Exon-Intron Database (EID) in FASTA format. Currently, the database is derived from GenBank release 112, and it contains 51 289 protein-coding genes (287 209 exons) that harbor introns, along with extensive descriptions of each gene and its DNA and protein sequences, as well as splice motif i...
متن کاملBioinformatic analysis of exon repetition, exon scrambling and trans-splicing in humans
MOTIVATION Using bioinformatic approaches we aimed to characterize poorly understood abnormalities in splicing known as exon scrambling, exon repetition and trans-splicing. RESULTS We developed a software package that allows large-scale comparison of all human expressed sequence tags (EST) sequences to the entire set of human gene sequences. Among 5,992,495 EST sequences, 401 cases of exon re...
متن کاملShort nucleotide sequences signal spliceosomal binding in nucleic acids.
We have explored the region around the splice sites of the human intron and exons from the exon-intron database (EID) and located a number of short 6-nucleotide and 7-nucleotide sequences that are relatively common in the regions. These short sequences, we expect play an important role in the selection of the appropriate splicing process. We propose that the external signals via short recogniti...
متن کاملNovel Single Nucleotide Polymorphisms (SNPs) in Intron 2 and Exon 3 Regions of Leptin Gene in Sumba Ongole Cattle
The bovine leptin (LEP) gene was widely used as a candidate gene for molecular selection to improve productivity traits of cattle. This study was carried out to identify single nucleotide polymorphisms (SNPs) in the LEP gene of Sumba Ongole (SO, Bos indicus) cows using sequencing method. A total of 31 animals were used in this study for analyses. Research showed that total of 16 SNPs w...
متن کاملLoss of Chloroplast trnLUAA Intron in Two Species of Hedysarum (Fabaceae): Evolutionary Implications
Previous studies have indicated that in all land plants examined to date, the chloroplast gene trnLUAA isinterrupted by a single group I intron ranging from 250 to over 1400 bp. The parasitic Epifagus virginiana haslost, however, the entire gene. We report that the intron is missing from the chloroplast genome of twoarctic species of the legume genus Hedysarum (H. alpinum, H. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Briefings in bioinformatics
دوره 7 2 شماره
صفحات -
تاریخ انتشار 2006